PAN 2010 : Detecting External Plagiarism Lab Report for Pan at CLEF 2010

نویسندگان

  • Rafael Corezola Pereira
  • Viviane P. Moreira
  • Renata Galante
چکیده

This paper presents our approach to detect plagiarism in the PAN’10 competition. To accomplish this task we applied a method which aims at detecting external plagiarism cases. The method is specially designed to detect crosslanguage plagiarism and is composed by five phases: language normalization, retrieval of candidate documents, classifier training, plagiarism analysis, and post-processing. Our group got the seventh place in the competition with an overall score of 0.5175. It is important to notice that the final score was affected by our low recall (0.4036) which arose as a result of not detecting intrinsic plagiarism cases, which were also present in the competition corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic External Plagiarism Detection Using Passage Similarities - Lab Report for PAN at CLEF 2010

In this paper, we report our approach in detecting external plagiarism. For the pre-processing stage, we identify non-English documents and translate them into English using an online translator tool. Then we index and retrieve the top documents that are similar to the suspicious documents. We divide the retrieved documents into passages where each passage contains twenty sentences. The plagiar...

متن کامل

CoReMo System (Contextual Reference Monotony) - Lab Report for PAN at CLEF 2010

In this paper a new approach is shown for a very fast monolingual external plagiarism detection system based on an altered n-gram concept (contextual n-gram), a new high precision contextual Information Retrieval engine, and a new pruning strategy (Referential Monotony) for plagiarism detection and its limits. The assessment results can be compared with the carried out by the winner team at PAN...

متن کامل

External Plagiarism Detection: N-Gram Approach Using Named Entity Recognizer - Lab Report for PAN at CLEF 2010

We tried Named Entity features of source documents to identify its suspicious counter part. A three stage identification method was adopted to understand the impact of NEs in plagiarism. Results along with a brief analysis are given in this note.

متن کامل

External Plagiarism Detection Based on Standard IR Technology and Fast Recognition of Common Subsequences - Lab Report for PAN at CLEF 2010

The plagiarism detection system described in this paper is aiming at bringing external plagiarism detection to the desktop. The main ideas are to incorporate standard IR technologies for the candidate selection and efficient data structures for the detailed analysis between a suspicious and a candidate document. Given that the system so far has only reached prototype status, the first results l...

متن کامل

A Cluster-Based Plagiarism Detection Method - Lab Report for PAN at CLEF 2010

In this paper we describe a cluster-based plagiarism detection method, which we have used in the learning management system of SCUT to detect plagiarism in the network engineering related courses. And we also used this method to detect external plagiarism in the PAN-10 competition. The method is divided into three steps: the first step, called pre-selecting, is to narrow the scope of detection ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010